Scalable Collaborative Filtering based on Latent Semantic Indexing
نویسندگان
چکیده
Nearest-neighbor collaborative filtering (CF) algorithms are gaining widespread acceptance in recommender systems and e-commerce applications. User ratings are not expected to be independent, as users follow trends of similar rating behavior. In terms of Text Mining, this is analogous to the formation of higher-level concepts from plain terms. In this paper, we propose a novel CF algorithm which uses Latent Semantic Indexing (LSI) to detect rating trends and performs recommendations according to them. We perform an extensive experimental evaluation, with two real data sets, and produce results that indicate its superiority over existing CF algorithms. Introduction The “information overload” problem affects our everyday experience while searching for valuable knowledge. To overcome this problem, we often rely on suggestions from others who have more experience on a topic. In Web case, this is more manageable with the introduction of Collaborative Filtering (CF), which provides recommendations based on the suggestions of users who have similar preferences. Two types of CF algorithms have been proposed in the literature: memory-based algorithms, which recommend according to the preferences of nearest neighbors, and modelbased algorithms, which recommend by first developing a model of user ratings. Related research has reported that memory-based algorithms (a.k.a. nearest-neighbor algorithms) present excellent performance, in terms of accuracy. Their basic drawback is that they cannot handle scalability and sparsity. This means that they face performance problems, when the volume of data is extremely big and sparse. Latent Semantic Indexing (LSI) has been extensively used in informational retrieval, to detect the latent semantic relationships between terms and documents. LSI constructs a low-rank approximation to the term-document matrix. As a result, it produces a less noisy matrix, which is better than the original one. Thus, higher level concepts are generated from plain terms. In CF, this is analogous to the formation of users’ trends from individual preferences. This work is conducted while the first two authors were scholars of the State Scholarships Foundation of Greece (IKY). Copyright c © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. In this paper, we propose a new algorithm that is based on LSI to produce a condensed model for the user-item matrix. This model comprises a matrix that captures the main user trends and presents a two-fold advantage: (i) it removes noise by focusing on main rating trends and not on particularities of each individual user, (ii) its size is much smaller than the original matrix, thus it can speedup the searching for similar users/items. Our contribution and novelty are summarized as follows: (i) based on Information Retrieval, we include the pseudouser concept in order to compare it with our processed data. This differs our method from related work (Sarwar et al. 2000b), where Singular Value Decomposition (SVD) methods have used only to summarize the user-item matrix for dimensionality reduction. (ii) We implement a novel algorithm, which tunes the number of principal components according to the data characteristics. (iii)We generalize the recommendation procedure for both userand item-based CF methods. (iv) We generate predictions based on the users’ neighbors and not based on the test user itself, as it has been reported in related work so far. (v) We propose a new top-N generation list algorithm based on SVD and the Highest Prediction Rated items. The rest of this paper is organized as follows. We summarize the related work and analyze the CF factors. We describe the proposed approach and give experimental results. Finally, we conclude this paper.
منابع مشابه
Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems
One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...
متن کاملCNS: A Task-based Hybrid Collaborative Filtering Recommender Service
In this paper, we introduce a task-based hybrid collaborative filtering (CF) recommender service to support users’ information gathering tasks. Even with the best web search engines (WSEs), and the most effective query formulations, information gathering tasks require people to work through long list of documents to determine potentially relevant documents. We propose and implement Singular Val...
متن کاملOptimization of Collaborative Filtering Systems
A collaborative filtering system (CFS) makes recommendations to users via the similarity and proximity between the profiles and taking into account their historical valuations. In contrast to most of the CFS, which are based on the approach-based users, we adopt the approach based items to improve the quality of recommendation, this process seems flexible and allowed us to integrate other sourc...
متن کاملLatent Dirichlet Allocation
We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hofmann's aspect model , also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where t...
متن کاملAn Effective Web Service Selection based on Hybrid Collaborative Filtering and QoS-Trust Evaluation
Web Service mining has become one of the predominant areas of Service Oriented Architecture. Web service discovery methods include syntactic based system and semantic based system. In the proposed work, both syntactic and semantic based approach is followed. The most widely used recommender technique is collaborative filtering. In this paper, we proposed architecture for Web Service Selection b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006